{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading data into sktime\n",
    "\n",
    "This tutorial outlines time series related file formats and how to load data into sktime.\n",
    "\n",
    "Users can load or convert data into sktime compatible formats in two main ways:\n",
    "\n",
    "* pathway 1: direct loading from time series formats. Data can be loaded directly from a bespoke time series storage format, for instance `.ts` ([see Representing data with .ts files](#ts_files)) or other supported file formats, such as Weka `ARFF` and `.tsv` (see [other existing data sources](#other_file_types)). \n",
    "* pathway 2: loading in-memory, then conversion. sktime provides functions to convert between common in-memory representations, see `AA_datatypes_and_datasets` tutorial. Hence, data can be loaded via any loader utility (e.g., `pandas.read_csv`) first, then converted manually into one of the sktime compatible specifications, and then converted between specifications using the sktime `convert` or `convert_to` utility.\n",
    "\n",
    "The rest of this tutorial provides descriptions of pathway 1, i.e., how to load data from supported file formats.\n",
    "\n",
    "<a id=\"ts_files\"></a>\n",
    "## The .ts file format\n",
    "One common use case is to load locally stored data. To make this easy, the `.ts` file format has been created for representing problems in a standard format for use with sktime. \n",
    "\n",
    "### Representing data with .ts files\n",
    "A .ts file include two main parts:\n",
    "* header information\n",
    "* data\n",
    "\n",
    "The header information is used to facilitate simple representation of the data through including metadata about the structure of the problem. The header contains the following:\n",
    "\n",
    "    @problemName <problem name>\n",
    "    @timeStamps <true/false>\n",
    "    @univariate <true/false>\n",
    "    @classLabel <true/false> <space delimited list of possible class values>\n",
    "    @data\n",
    "\n",
    "The data for the problem should begin after the @data tag. In the simplest case where @timestamps is false, values for a series are expressed in a comma-separated list and the index of each value is relative to its position in the list (0, 1, ..., m). An _instance_ may contain 1 to many dimensions, where instances are line-delimited and dimensions within an instance are colon (:) delimited. For example:\n",
    "\n",
    "    2,3,2,4:4,3,2,2\n",
    "    13,12,32,12:22,23,12,32\n",
    "    4,4,5,4:3,2,3,2\n",
    "\n",
    "This example data has 3 _instances_, corresponding to the three lines shown above. Each instance has 2 _dimensions_ with 4 observations per dimension. For example, the initial instance's first dimension has the timepoint values of 2, 3, 2, 4 and the second dimension has the values 4, 3, 2, 2.\n",
    "\n",
    "Missing readings can be specified using ?. For example, \n",
    "\n",
    "    2,?,2,4:4,3,2,2\n",
    "    13,12,32,12:22,23,12,32\n",
    "    4,4,5,4:3,2,3,2\n",
    "    \n",
    "would indicate the second timepoint value of the initial instance's first dimension is missing. \n",
    "\n",
    "Alternatively, for sparse datasets, readings can be specified by setting @timestamps to true in the header and representing the data with tuples in the form of (timestamp, value) just for the obser. For example, the first instance in the example above could be specified in this representation as:\n",
    "\n",
    "    (0,2),(1,3)(2,2)(3,4):(0,4),(1,3),(2,2),(3,2)\n",
    "\n",
    "Equivalently, the sparser example\n",
    "\n",
    "    2,5,?,?,?,?,?,5,?,?,?,?,4\n",
    "\n",
    "could be represented with just the non-missing timestamps as:\n",
    "\n",
    "    (0,2),(1,5),(7,5),(12,4)\n",
    "\n",
    "When using the .ts file format to store data for timeseries classification problems, the class label for an instance should be specified in the last dimension and @classLabel should be set to true in the header information and be followed by the set of possible class values. For example, if a case consists of a single dimension and has a class value of 1 it would be specified as:\n",
    "\n",
    "     1,4,23,34:1\n",
    "\n",
    "\n",
    "### Loading from .ts file to pandas DataFrame\n",
    "\n",
    "A dataset can be loaded from a .ts file using the following method in sktime.datasets:\n",
    "\n",
    "    load_from_tsfile(full_file_path_and_name, replace_missing_vals_with='NaN')\n",
    "\n",
    "This can be demonstrated using the Arrow Head problem that is included in sktime under sktime/datasets/data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:13.134330Z",
     "iopub.status.busy": "2020-12-19T14:32:13.133562Z",
     "iopub.status.idle": "2020-12-19T14:32:13.811083Z",
     "shell.execute_reply": "2020-12-19T14:32:13.811445Z"
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import sktime\n",
    "from sktime.datasets import load_from_tsfile\n",
    "\n",
    "DATA_PATH = os.path.join(os.path.dirname(sktime.__file__), \"datasets/data\")\n",
    "\n",
    "train_x, train_y = load_from_tsfile(\n",
    "    os.path.join(DATA_PATH, \"ArrowHead/ArrowHead_TRAIN.ts\")\n",
    ")\n",
    "test_x, test_y = load_from_tsfile(\n",
    "    os.path.join(DATA_PATH, \"ArrowHead/ArrowHead_TEST.ts\")\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Train and test partitions of the ArrowHead problem have been loaded into nested dataframes (the `nested_univ` format for panel data) with an associated array of class values. As an example, below are the first 5 rows from the train_x and train_y:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:13.828436Z",
     "iopub.status.busy": "2020-12-19T14:32:13.823584Z",
     "iopub.status.idle": "2020-12-19T14:32:13.831026Z",
     "shell.execute_reply": "2020-12-19T14:32:13.831523Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dim_0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0     -1.963009\n",
       "1     -1.957825\n",
       "2     -1.95614...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0     -1.774571\n",
       "1     -1.774036\n",
       "2     -1.77658...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0     -1.866021\n",
       "1     -1.841991\n",
       "2     -1.83502...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0     -2.073758\n",
       "1     -2.073301\n",
       "2     -2.04460...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0     -1.746255\n",
       "1     -1.741263\n",
       "2     -1.72274...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               dim_0\n",
       "0  0     -1.963009\n",
       "1     -1.957825\n",
       "2     -1.95614...\n",
       "1  0     -1.774571\n",
       "1     -1.774036\n",
       "2     -1.77658...\n",
       "2  0     -1.866021\n",
       "1     -1.841991\n",
       "2     -1.83502...\n",
       "3  0     -2.073758\n",
       "1     -2.073301\n",
       "2     -2.04460...\n",
       "4  0     -1.746255\n",
       "1     -1.741263\n",
       "2     -1.72274..."
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_x.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:13.834947Z",
     "iopub.status.busy": "2020-12-19T14:32:13.834437Z",
     "iopub.status.idle": "2020-12-19T14:32:13.836849Z",
     "shell.execute_reply": "2020-12-19T14:32:13.837412Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['0', '1', '2', '0', '1'], dtype='<U1')"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_y[0:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The format of the loaded data can be controlled by the `return_data_type` argument.\n",
    "\n",
    "Allowed strings are identifier strings for sktime compatible `Panel` mtypes, as introduced in the \"datatypes and datasets\" tutorial (`AA_datatypes_and_datasets`).\n",
    "\n",
    "If provided, the loaded in-memory data container will comply with that type specification."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_x, train_y = load_from_tsfile(\n",
    "    os.path.join(DATA_PATH, \"ArrowHead/ArrowHead_TRAIN.ts\"), return_data_type=\"numpy3d\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[[-1.9630089, -1.9578249, -1.9561449, ..., -1.9053929,\n",
       "         -1.9239049, -1.9091529]],\n",
       "\n",
       "       [[-1.7745713, -1.7740359, -1.7765863, ..., -1.7292269,\n",
       "         -1.7756704, -1.7893245]],\n",
       "\n",
       "       [[-1.8660211, -1.8419912, -1.8350253, ..., -1.8625124,\n",
       "         -1.8633682, -1.8464925]],\n",
       "\n",
       "       ...,\n",
       "\n",
       "       [[-2.1308119, -2.1044297, -2.0747549, ..., -2.0340977,\n",
       "         -2.0800313, -2.103448 ]],\n",
       "\n",
       "       [[-1.8803376, -1.8626622, -1.8496866, ..., -1.8485336,\n",
       "         -1.8640342, -1.8798851]],\n",
       "\n",
       "       [[-1.80105  , -1.7989155, -1.7783754, ..., -1.7965491,\n",
       "         -1.7985443, -1.80105  ]]])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"other_file_types\"></a>\n",
    "## Loading other file formats\n",
    "Researchers who have made timeseries data available have used two other common formats, including:\n",
    "\n",
    "+ Weka ARFF files\n",
    "+ UCR .tsv files\n",
    "\n",
    "\n",
    "### Loading from Weka ARFF files\n",
    "\n",
    "It is also possible to load data from Weka's attribute-relation file format (ARFF) files. Data for timeseries problems are made available in this format by researchers at the University of East Anglia (among others) at www.timeseriesclassification.com. The `load_from_arff_to_dataframe` method in `sktime.datasets` supports reading data for both univariate and multivariate timeseries problems. \n",
    "\n",
    "The univariate functionality is demonstrated below using data on the ArrowHead problem again (this time loading from ARFF file)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:13.840562Z",
     "iopub.status.busy": "2020-12-19T14:32:13.840050Z",
     "iopub.status.idle": "2020-12-19T14:32:13.869367Z",
     "shell.execute_reply": "2020-12-19T14:32:13.869937Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dim_0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0     -1.963009\n",
       "1     -1.957825\n",
       "2     -1.95614...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0     -1.774571\n",
       "1     -1.774036\n",
       "2     -1.77658...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0     -1.866021\n",
       "1     -1.841991\n",
       "2     -1.83502...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0     -2.073758\n",
       "1     -2.073301\n",
       "2     -2.04460...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0     -1.746255\n",
       "1     -1.741263\n",
       "2     -1.72274...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               dim_0\n",
       "0  0     -1.963009\n",
       "1     -1.957825\n",
       "2     -1.95614...\n",
       "1  0     -1.774571\n",
       "1     -1.774036\n",
       "2     -1.77658...\n",
       "2  0     -1.866021\n",
       "1     -1.841991\n",
       "2     -1.83502...\n",
       "3  0     -2.073758\n",
       "1     -2.073301\n",
       "2     -2.04460...\n",
       "4  0     -1.746255\n",
       "1     -1.741263\n",
       "2     -1.72274..."
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sktime.datasets import load_from_arff_to_dataframe\n",
    "\n",
    "X, y = load_from_arff_to_dataframe(\n",
    "    os.path.join(DATA_PATH, \"ArrowHead/ArrowHead_TRAIN.arff\")\n",
    ")\n",
    "X.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The multivariate BasicMotions problem is used below to illustrate the ability to read multivariate timeseries data from ARFF files into the sktime format. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:13.873136Z",
     "iopub.status.busy": "2020-12-19T14:32:13.872669Z",
     "iopub.status.idle": "2020-12-19T14:32:13.946859Z",
     "shell.execute_reply": "2020-12-19T14:32:13.947377Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dim_0</th>\n",
       "      <th>dim_1</th>\n",
       "      <th>dim_2</th>\n",
       "      <th>dim_3</th>\n",
       "      <th>dim_4</th>\n",
       "      <th>dim_5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0     0.079106\n",
       "1     0.079106\n",
       "2    -0.903497\n",
       "3...</td>\n",
       "      <td>0     0.394032\n",
       "1     0.394032\n",
       "2    -3.666397\n",
       "3...</td>\n",
       "      <td>0     0.551444\n",
       "1     0.551444\n",
       "2    -0.282844\n",
       "3...</td>\n",
       "      <td>0     0.351565\n",
       "1     0.351565\n",
       "2    -0.095881\n",
       "3...</td>\n",
       "      <td>0     0.023970\n",
       "1     0.023970\n",
       "2    -0.319605\n",
       "3...</td>\n",
       "      <td>0     0.633883\n",
       "1     0.633883\n",
       "2     0.972131\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0     0.377751\n",
       "1     0.377751\n",
       "2     2.952965\n",
       "3...</td>\n",
       "      <td>0    -0.610850\n",
       "1    -0.610850\n",
       "2     0.970717\n",
       "3...</td>\n",
       "      <td>0    -0.147376\n",
       "1    -0.147376\n",
       "2    -5.962515\n",
       "3...</td>\n",
       "      <td>0    -0.103872\n",
       "1    -0.103872\n",
       "2    -7.593275\n",
       "3...</td>\n",
       "      <td>0    -0.109198\n",
       "1    -0.109198\n",
       "2    -0.697804\n",
       "3...</td>\n",
       "      <td>0    -0.037287\n",
       "1    -0.037287\n",
       "2    -2.865789\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0    -0.813905\n",
       "1    -0.813905\n",
       "2    -0.424628\n",
       "3...</td>\n",
       "      <td>0     0.825666\n",
       "1     0.825666\n",
       "2    -1.305033\n",
       "3...</td>\n",
       "      <td>0     0.032712\n",
       "1     0.032712\n",
       "2     0.826170\n",
       "3...</td>\n",
       "      <td>0     0.021307\n",
       "1     0.021307\n",
       "2    -0.372872\n",
       "3...</td>\n",
       "      <td>0     0.122515\n",
       "1     0.122515\n",
       "2    -0.045277\n",
       "3...</td>\n",
       "      <td>0     0.775041\n",
       "1     0.775041\n",
       "2     0.383526\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0     0.289855\n",
       "1     0.289855\n",
       "2    -0.669185\n",
       "3...</td>\n",
       "      <td>0     0.284130\n",
       "1     0.284130\n",
       "2    -0.210466\n",
       "3...</td>\n",
       "      <td>0     0.213680\n",
       "1     0.213680\n",
       "2     0.252267\n",
       "3...</td>\n",
       "      <td>0    -0.314278\n",
       "1    -0.314278\n",
       "2     0.018644\n",
       "3...</td>\n",
       "      <td>0     0.074574\n",
       "1     0.074574\n",
       "2     0.007990\n",
       "3...</td>\n",
       "      <td>0    -0.079901\n",
       "1    -0.079901\n",
       "2     0.237040\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0    -0.123238\n",
       "1    -0.123238\n",
       "2    -0.249547\n",
       "3...</td>\n",
       "      <td>0     0.379341\n",
       "1     0.379341\n",
       "2     0.541501\n",
       "3...</td>\n",
       "      <td>0    -0.286006\n",
       "1    -0.286006\n",
       "2     0.208420\n",
       "3...</td>\n",
       "      <td>0    -0.098545\n",
       "1    -0.098545\n",
       "2    -0.023970\n",
       "3...</td>\n",
       "      <td>0     0.058594\n",
       "1     0.058594\n",
       "2     0.175783\n",
       "3...</td>\n",
       "      <td>0    -0.074574\n",
       "1    -0.074574\n",
       "2     0.114525\n",
       "3...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               dim_0  \\\n",
       "0  0     0.079106\n",
       "1     0.079106\n",
       "2    -0.903497\n",
       "3...   \n",
       "1  0     0.377751\n",
       "1     0.377751\n",
       "2     2.952965\n",
       "3...   \n",
       "2  0    -0.813905\n",
       "1    -0.813905\n",
       "2    -0.424628\n",
       "3...   \n",
       "3  0     0.289855\n",
       "1     0.289855\n",
       "2    -0.669185\n",
       "3...   \n",
       "4  0    -0.123238\n",
       "1    -0.123238\n",
       "2    -0.249547\n",
       "3...   \n",
       "\n",
       "                                               dim_1  \\\n",
       "0  0     0.394032\n",
       "1     0.394032\n",
       "2    -3.666397\n",
       "3...   \n",
       "1  0    -0.610850\n",
       "1    -0.610850\n",
       "2     0.970717\n",
       "3...   \n",
       "2  0     0.825666\n",
       "1     0.825666\n",
       "2    -1.305033\n",
       "3...   \n",
       "3  0     0.284130\n",
       "1     0.284130\n",
       "2    -0.210466\n",
       "3...   \n",
       "4  0     0.379341\n",
       "1     0.379341\n",
       "2     0.541501\n",
       "3...   \n",
       "\n",
       "                                               dim_2  \\\n",
       "0  0     0.551444\n",
       "1     0.551444\n",
       "2    -0.282844\n",
       "3...   \n",
       "1  0    -0.147376\n",
       "1    -0.147376\n",
       "2    -5.962515\n",
       "3...   \n",
       "2  0     0.032712\n",
       "1     0.032712\n",
       "2     0.826170\n",
       "3...   \n",
       "3  0     0.213680\n",
       "1     0.213680\n",
       "2     0.252267\n",
       "3...   \n",
       "4  0    -0.286006\n",
       "1    -0.286006\n",
       "2     0.208420\n",
       "3...   \n",
       "\n",
       "                                               dim_3  \\\n",
       "0  0     0.351565\n",
       "1     0.351565\n",
       "2    -0.095881\n",
       "3...   \n",
       "1  0    -0.103872\n",
       "1    -0.103872\n",
       "2    -7.593275\n",
       "3...   \n",
       "2  0     0.021307\n",
       "1     0.021307\n",
       "2    -0.372872\n",
       "3...   \n",
       "3  0    -0.314278\n",
       "1    -0.314278\n",
       "2     0.018644\n",
       "3...   \n",
       "4  0    -0.098545\n",
       "1    -0.098545\n",
       "2    -0.023970\n",
       "3...   \n",
       "\n",
       "                                               dim_4  \\\n",
       "0  0     0.023970\n",
       "1     0.023970\n",
       "2    -0.319605\n",
       "3...   \n",
       "1  0    -0.109198\n",
       "1    -0.109198\n",
       "2    -0.697804\n",
       "3...   \n",
       "2  0     0.122515\n",
       "1     0.122515\n",
       "2    -0.045277\n",
       "3...   \n",
       "3  0     0.074574\n",
       "1     0.074574\n",
       "2     0.007990\n",
       "3...   \n",
       "4  0     0.058594\n",
       "1     0.058594\n",
       "2     0.175783\n",
       "3...   \n",
       "\n",
       "                                               dim_5  \n",
       "0  0     0.633883\n",
       "1     0.633883\n",
       "2     0.972131\n",
       "3...  \n",
       "1  0    -0.037287\n",
       "1    -0.037287\n",
       "2    -2.865789\n",
       "3...  \n",
       "2  0     0.775041\n",
       "1     0.775041\n",
       "2     0.383526\n",
       "3...  \n",
       "3  0    -0.079901\n",
       "1    -0.079901\n",
       "2     0.237040\n",
       "3...  \n",
       "4  0    -0.074574\n",
       "1    -0.074574\n",
       "2     0.114525\n",
       "3...  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X, y = load_from_arff_to_dataframe(\n",
    "    os.path.join(DATA_PATH, \"BasicMotions/BasicMotions_TRAIN.arff\")\n",
    ")\n",
    "X.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading from UCR .tsv Format Files\n",
    "\n",
    "A further option is to load data into sktime from tab separated value (.tsv) files. Researchers at the University of Riverside, California make a variety of timeseries data available in this format at https://www.cs.ucr.edu/~eamonn/time_series_data_2018. \n",
    "\n",
    "The `load_from_ucr_tsv_to_dataframe` method in `sktime.datasets` supports reading  \n",
    "univariate problems. An example with ArrowHead is given below to demonstrate equivalence with loading from the .ts and ARFF file formats."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:13.958719Z",
     "iopub.status.busy": "2020-12-19T14:32:13.958207Z",
     "iopub.status.idle": "2020-12-19T14:32:13.991444Z",
     "shell.execute_reply": "2020-12-19T14:32:13.992003Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dim_0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0     -1.963009\n",
       "1     -1.957825\n",
       "2     -1.95614...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0     -1.774571\n",
       "1     -1.774036\n",
       "2     -1.77658...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0     -1.866021\n",
       "1     -1.841991\n",
       "2     -1.83502...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0     -2.073758\n",
       "1     -2.073301\n",
       "2     -2.04460...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0     -1.746255\n",
       "1     -1.741263\n",
       "2     -1.72274...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               dim_0\n",
       "0  0     -1.963009\n",
       "1     -1.957825\n",
       "2     -1.95614...\n",
       "1  0     -1.774571\n",
       "1     -1.774036\n",
       "2     -1.77658...\n",
       "2  0     -1.866021\n",
       "1     -1.841991\n",
       "2     -1.83502...\n",
       "3  0     -2.073758\n",
       "1     -2.073301\n",
       "2     -2.04460...\n",
       "4  0     -1.746255\n",
       "1     -1.741263\n",
       "2     -1.72274..."
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sktime.datasets import load_from_ucr_tsv_to_dataframe\n",
    "\n",
    "X, y = load_from_ucr_tsv_to_dataframe(\n",
    "    os.path.join(DATA_PATH, \"ArrowHead/ArrowHead_TRAIN.tsv\")\n",
    ")\n",
    "X.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"convert\"></a>\n",
    "## Converting between other NumPy and pandas formats\n",
    "\n",
    "To convert loaded formats to other sktime internal formats, use the `convert` or `convert_to` utilities in the `sktime.datatypes` module.\n",
    "For more details, see the tutorial `AA_datatypes_and_datasets`."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
